CS131 Summary
宋富老师的Compiler课,东西很多,忘得快。记点框架,免得全忘了。
教材:Compilers: Principles, Techniques, & Tools 2nd
Lexical Analysis
正则表达式
Atomic reg: for every a in $\Sigma$ , Epsilon: $\epsilon$
Compound reg: $r+s,rs,r^*$
Transition diagrams (a graph representation of reg)
自动机
Deterministic, DFA( widely used in lexical analyzers ); Non-Deterministic NFA
Two algorithm:
- Reg=>NFA=>DFA
- Reg=>DFA (more complicated)
Reg => NFA
Allows $\epsilon$ transitions, move to another state without consuming any symbol.
Crafting every symbols one by one with $L(\epsilon),L(a),L(s)\cup L(t)$
NFA => DFA
Calculate $\epsilon$ passing and construct all possible subsets and corresponding input symbol.
Rename each subset and reassemble with input symble.
Reg => DFA
Evaluate firstpos, latpos, nullable. Draw trees and construct DFA.
Abstractive and a little complicated, needs reviewing according to tutorial in youtube.
Minimize DFA
Forgot.
Error Handling
- Panic Mode
- Phrase Level
Context free language and parsing
Eliminate ambiguity
- Grammar rewrite
- Enforce precedence and associativity
Top-down parsing
Recursive-descent paring
Backtracking, general, not efficient
Eliminate left recursion:
- Immediate recursions
- All left recursions
Predictive parsing
No backtracking, efficient, LL(k)
Left-factoring
Construct LL(1) parsing table
For each rule $A\rightarrow \alpha$ :
- terminal a in FIRST($\alpha$)? Add $A\rightarrow \alpha$ to M[A,a]
- $\epsilon$ in $FIRST(\alpha)$? Add $A\rightarrow \alpha$ to M[A,b] for all terminals b in $FOLLOW(A)$
- $\epsilon$ in $FIRST(\alpha)$? and $ in $FOLLOW(A)$? Add $A\rightarrow \alpha$ to M[A,$]
- All undefined entries are errors
Not a LL(1) grammar and error recovery.
Bottom-up parsing
Operator precedence parsing
Not much impression, maybe not important.
Construct LR(0) parsing table
Long journey, need tutorial on youtube
SLR(1) parsing table
建表时,reduce操作只向$Follow(A)$中加.
Cautions to ambiguous grammar (left/right associative)
LR(1)
对一个state,包含所有可能的下一个input.
LALR(1) parsing table
引入reduce-redure conflict.
Intermediate represnetation
Abstract syntax trees
保留代码和流程,不能遍历、修改。
Directed acyclic graphs
AST合并同值节点
Control flow graphs
冗余、保守的源信息
Single static assignment
每个变量只定义、赋值一次,有利于优化
Stack machine code
容易生成,兼容性好;难优化,难重用
3-address code
适合多个level的IR;占空间,语义消失
Compilation strategies
- Hight-level
- Mid-level
- Low-level
以上是期中前的内容,下面是期中后的内容